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Abstract 



Approximate Bayesian computation (ABC) methods perform inference on 
model-specific parameters of mechanistically motivated parametric statis- 
tical models when evaluating likelihoods is difficult. Central to the suc- 
cess of ABC methods is computationally inexpensive simulation of data sets 
from the parametric model of interest. However, when simulating data sets 
from a model is so computationally expensive that the posterior distribu- 
tion of parameters cannot be adequately sampled by ABC, inference is not 
straightforward. We present "approximate approximate Bayesian computa- 
tion" (AABC), a class of methods that extends simulation-based inference 
by ABC to models in which simulating data is expensive. In AABC, we 
first simulate a limited number of data sets that is computationally feasi- 
ble to simulate from the parametric model. We use these data sets as fixed 
background information to inform a non-mechanistic statistical model that 
approximates the correct parametric model and enables efficient simulation 
of a large number of data sets by Bayesian resampling methods. We show 
that under mild assumptions, the posterior distribution obtained by AABC 
converges to the posterior distribution obtained by ABC, as the number of 
data sets simulated from the parametric model and the sample size of the 
observed data set increase simultaneously. We illustrate the performance of 
AABC on a population-genetic model of natural selection, as well as on a 
model of the admixture history of hybrid populations. 

Keywords: Approximate Bayesian computation, likelihood-free methods, 
nonpar ametrics, posterior distribution 
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1 Introduction 



Stochastic processes motivated by mechanistic considerations enable inves- 
tigators to capture salient phenomena in modeling natural systems. Sta- 
tistical models resulting from these stochastic processes are often paramet- 
ric, and estimating model-specific parameters — which often have a natural 
interpretation — is a major aim of data analysis. Contemporary mechanistic 
models tend to involve complex stochastic processes, however, and paramet- 
ric statistical models resulting from these processes lead to computationally 
intractable likelihood functions. When likelihood functions are computation- 
ally intractable, likelihood-based inference is a challenging problem that has 



received considerable attention in the literature (Robert and Casella, 2004 



Liu, 2008). 



When statistical models are known only at the level of the stochastic 
mechanism generating the data — such as in implicit statistical models (Diggle 



and Gratton, 1984) — explicit evaluation of likelihoods might be impossible. 
In these models, standard computational methods that require evaluation of 
likelihoods up to a proportionality constant (e.g., rejection methods) cannot 
be used to sample distributions of interest. However, data sets simulated from 
the model under a range of parameter values can be used to assess parameter 



likelihoods without explicit evaluation (Rubin 



computation (ABC) methods (Tavare et al. 



1984). Approximate Bayesian 



1997; Beaumont et al. 2002 



Marjoram et al. |2003 ) implement this idea in a Bayesian context to sample an 
approximate posterior distribution of the parameters. Intuitively, parameter 
values producing simulated data sets similar to the observed data set arise 
in approximate proportion to their likelihood, and hence, when weighted by 
prior probabilities, to their posterior probabilities. 



1.1 The ABC literature 



ABC methods have been based on rejection algorithms (Tavare et al. 1997 



Beaumont et al. 


2002 


Blum and Francois 


2010) 


, Markov chain Monte Carlo 


( Beaumont 


200: 


5 


Vlarjoram et al. 2003 


Bortot et al., 


2007 Wegmann et 


al, 2009), and sequential Monte Carlo (Sisson et al., 2007 


2009 Beaumont 


et al. 


20091 r 


Ibni et al 


200£ 


). Model selection using ABC 


(Pritchard et al. 


19991 


7 agundes et al. 


20071 


Glrelaud et al. 


2009 


Blum and Jakobsson 


, 2010 


Robert et al 


. 2011 


), the choice of summary statistics when the likelihooc 



is based on summary statistics instead of the full data (Joyce and Marjo- 
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ram 



2008} Wegmann et al, 2009 Nunes and Balding, 2010; Fearnhead and 



Prangle, 2012), and the equivalence of posterior distributions targeted in dif- 



2008; Sisson et al., 2010) have also been 



ferent ABC methods (Wilkinson 
investigated. 

ABC methods have had a considerable effect on model-based inference 
in disciplines that rely on genetic data, particularly data shaped by diverse 
evolutionary, demographic, and environmental forces. Example applications 



have included problems in the demographic history of populations (Pritchard 



et al.\ |1999[ |Frangois et al. 2008[ Verdu et al. 20091 Blum and Jakobsson[ 



2010 


) and species ( 


Estoup et al., 


2004: 


Plagnol and Tavare 


2004; 


Becquet 


and Przeworski 


2007 


Fagundes et al. 




>007 


Wilkinson et al. 


2010 


, as well 



as problems in the evolution of cancer cell lineages (Tavare 



2005 



Siegmund 



et al. 2008) and the evolution of protein networks (Ratmann et al. 2009) 



Other applications outside of genetics have included inference on the physics 
of stereological extremes (iBortot et al. , 2007), the ecology of tropical forests 



(Jabot and Chave, 2009), dynamical systems in biology (Toni et al. 2009), 



and small- world network disease models (Walker et al. , 2010). ABC methods 



have been reviewed by Marjoram and Tavare (2006), Cornuet et al. (2008), 



Beaumont et al.\ (|2009|), |Beaumont| (|2010j), |Csillery et al.\ (|2010j), and 
et al] (120111). 



Marin 



1.2 A limitation of ABC methods 

An informal categorization of the information available about the likelihood 
function is helpful to illustrate the class of models in which ABC methods 
are most useful. First, exact inference on the posterior distribution of the 
parameters is possible only if the likelihood function is analytically available. 
Second, if the likelihood function is not analytically available but can be eval- 
uated up to a constant given a parameter value, then standard computational 
methods such as rejection algorithms can sample the posterior distribution. 
In this case, inference is exact up to a Monte Carlo error due to sampling 
from the posterior. Third, if the likelihood function cannot be evaluated, 
but data sets can feasibly be simulated from the model, then ABC methods 
sample the posterior distribution using approximations on the data space in 
addition to a Monte Carlo error due to sampling. 

Although ABC methods sample the posterior distribution of parameters 
without evaluating the likelihood function, they are computationally inten- 
sive. Adequately sampling a posterior distribution of a parameter by ABC 
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requires many random realizations from the prior distribution of the param- 
eter and the sampling distribution of the data. Simulating from the prior 
is straightforward, but the computational cost of simulating a data set from 
the mechanistic model increases quickly with the complexity and number of 
stochastic processes involved. Henceforth, we refer to statistical models in 
which not only evaluating the likelihoods is difficult but also simulating a 
large number of data sets is computationally infeasible as limited- generative 
models. When a model is limited-generative and only a small number of 
data sets can be simulated from the model, likelihoods cannot be assessed 
using ABC and hence, the posterior distribution of parameters cannot be 
adequately sampled. 

1.3 Our contribution 

In this article, we introduce approximate approximate Bayesian computa- 
tion (AABC), a class of methods that perform inference on model-specific 
parameters of limited-generative models when standard ABC methods are 
computationally infeasible to apply. In AABC, the idea of assessing the like- 
lihoods approximately using simulated data sets is taken one step further 
than in ABC. AABC methods make approximations on the parameter space 
and the model space in addition to standard ABC approximations on the 
data space. In conjunction with Bayesian resampling methods, these ap- 
proximations help us overcome the computational intractability associated 
with simulating data from a limited-generative model (Figure [T]). 

Our key innovation is to condition on a limited number of data sets that 
can be feasibly simulated from the limited-generative model and to employ 
a non-mechanistic statistical model to simulate a large number of data sets. 
We set up the non-mechanistic model based on empirical distributions of the 
limited number of data sets simulated from the mechanistic model. Since 
the data values from the limited number of simulated data sets are used to 
construct new random data sets by resampling methods, it is computationally 
inexpensive to simulate a large number of data sets in AABC. The AABC 
approach allows a researcher to allocate a fixed computer time to simulating 
a limited number of data sets from the limited-generative model, thus making 
otherwise challenging likelihood-based inference attainable. 

Intuitively, the information conditioned upon by the non-mechanistic 
model increases with the number of data sets simulated from the mecha- 
nistic model, and the expected accuracy of inference obtained by AABC 
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Simulation-based methods for model-based Bayesian in 
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i 
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* 
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( 
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No 
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the model computationally feasible? 

No 



Nonparametric inference unrelated 
to the model-specific parameters 



' Samples relumed by reicctic- alaoritrvr arc "■dc::.c , dcnt, whereas MCMC returns dependent samples 



Figure 1: Applicability of simulation-based inference methods in relation to 
the information available about the likelihood function. 



methods increases. We formalize this intuition by showing that the posterior 
distribution of parameters obtained by AABC converges to the correspond- 
ing posterior distribution obtained by standard ABC, as the sample size 
of the observed data set and the number of data sets simulated from the 
limited-generative model increase simultaneously. 

AABC methods utilize the established machinery of ABC methods in 
sampling the posterior distribution of the parameters. Therefore, standard 
approximations on the data space involved in an ABC method — which facil- 
itate the sampling of the posterior distribution — apply to AABC methods as 
well. We now briefly review these approximations in the context of ABC by 
rejection algorithms. 



2 Review of ABC by rejection algorithms 

To more formally set up the class of problems in which ABC methods are 
useful, we assume that a parametric model generates observations conditional 
on parameter 9 e = R p , p > 1. We let Pq be the sampling distribution 
of a data set of n observations independent and identically distributed (IID) 
from this model. We denote a random data set by x = (xi,x 2 , ■■■,x n ) G X, 
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where X is the space in which the data set sits, and the observed data set by 
x D . In the genetics context, a data point X{ might be a vector denoting the 
allelic types of a genetic locus at genomic position % in a group of individuals; 
the data matrix x might then contain genotypes from these individuals in a 
sample of n independent genetic loci. 

Suppose that Pg is available to the extent that the likelihood function 
p(x o |0) can be evaluated up to a constant whose value does not depend on 
the parameters. Given a prior distribution tt(9) on parameter 9, the pos- 
terior distribution of 9 given the observed data x D under the model Pg is 
7r(0|x o , Pg). Then 7r(0|x o , Pg) can be sampled by standard rejection sampling 
from p(x. \9)7i(9), a quantity that is proportional to n(9\x. ,Pg) by Bayes' 
Theorem. In principle, sampling Tr(9\x. ,Pg) without evaluating the likeli- 
hood function p(pc \9) is possible, if simulating the data from the model Pg 
is feasible. An early example due to Tavare et al. (1997) samples n(9\x. , Pg) 
by accepting a value 9i simulated from the prior it {9) only if the data set 
Xj simulated from Pg. satisfies Xj = x D . By standard rejection algorithm 
arguments, the 9i sampled in this fashion are from the correct posterior dis- 
tribution. However, the acceptance condition x^ = x D is rarely satisfied with 
high-dimensional data. A first approximation in ABC methods is dimen- 
sion reduction by substituting the data set x with a low- dimensional set of 
summary statistics s. The observed data x Q and the simulated data Xj are 
substituted by s Q and s iy calculated from their respective data sets. This 
is equivalent to substituting the likelihood function of the data p(x.\9) with 
the likelihood function of the summary statistics p(s\9). Since ABC is most 
useful in statistical models that do not admit sufficient statistics, dimension 
reduction to summary statistics often entails information loss about the pa- 
rameters. The choice of summary statistics minimizing this information loss 



is an active research area ( 


Joyce and Marjoram 


2008 


Wegmann et al. 


. 2009 


Robert et al., 


2011 


Aeschbacher et al. , 


2012 


Fearnhead and Prangle 


2012) 



When the data are substituted with summary statistics, the acceptance 
condition Xj = x D is substituted by Sj = s a , but exact equality may still 
be too stringent a condition to be satisfied with simulated data. A second 
approximation in ABC is to relax the exact acceptance condition with a 
tolerance acceptance condition. For example, Pritchard et al. (1999) used 



the Euclidean distance 1 1 • 1 1 and a small tuning parameter e to accept a value 
9t from an approximate posterior distribution if the data set Xj simulated 
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from P g . produced Sj satisfying 



1/2 



'"J ■ 



< e, 



(1) 



where s is a fc-dimensional statistic 

of Si and s c , respectively (see also Weiss and Von Haeseler 



and Sij and s Q j are the jth components 



(1998) for an 



application in a pure likelihood inference context). Distance metrics other 



than the Euclidean distance, such as the total variation distance (Tavare et 
al 



2002), have also been used. 



Substituting the binary accept/reject step in the rejection sampling by 
weighting s, smoothly according to its distance from s D using a kernel density 
K e (sj, s Q ) with bandwidth e leads to importance sampling (Wilkinson, 2008). 
The tolerance condition | |sj — s \\ < e in the rejection algorithm of Pritchard 



et al. (1999) then corresponds to using a uniform kernel on an e-ball around 



s a . Other approaches to kernel choice include Epanechnikov (Beaumont et 
al 



2002) and Gaussian (Leuenberger and Wegmann, 2010) kernels. 



When the data likelihood is substituted by the likelihood based on the 
summary statistics and a tolerance condition with a uniform kernel and the 
Euclidean distance is used, the posterior distribution sampled with ABC by 
rejection is 



7l e (9\x ,Pg 



1 



c, 



1^ 



}P (x\6)n(9) dx, 



(2) 



x 



where Ia is an indicator function that takes a value of 1 on set A and is zero 
otherwise, and Cp g = J e J x l^\ s ^ So \\ <e yp('x\9)ii(9) <ix d6 is the normalizing 
constant. A standard ABC algorithm that samples 7r e (0|x o , Pg) appears in 
Figure |2j 

The choice of summary statistics, tolerance parameter e, distance func- 
tion, and kernel constitute approximations on the data space in ABC meth- 
ods. We assume that these standard ABC approximations work reasonably 
well, and we focus on new modeling approximations on the parameter and 
model spaces introduced by AABC (Figure [3]). 
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Algorithm 1 : ABC by rejection algorithm 
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1. 


Simulate 0; ~ 


i i 


1 


1 






lp(x|«l) |p(x|0 2 ) 


lKx|e m ) 


I J>(x|0m) 


2. 


Simulate x, ~ P 0i . 


Xi x 2 




xm 






1 1 

si s 2 


1 

■ s m 


1 


3. 


Calculate the summary statistics s ; from x. ; . 


I 1 

? ? 


I 

7 


t 


4. 


If \\s„ — Sj | < e, output . 



Figure 2: The ABC algorithm by rejection sampling. One iteration of the 
algorithm is shown on the right along with a schematic illustration of sam- 
pling from the posterior distribution of 9 based on M proposed parameter 
values (left). 

3 Approximate approximate Bayesian com- 
putation (AABC) 

Algorithm 1 returns an adequate sample size from the posterior distribu- 
tion of a parameter if it is iterated a large number of times, M. The set of 
realizations simulated from the joint distribution of the parameter and the 
data by steps 1 and 2 of Algorithm 1 is then {(xi, 9\), (x 2 , #2), (xm, 9m)}- 
AABC methods seek inference on parameter 9 when the model Pq is limited- 
generative, and simulating M data sets under Pq is therefore computation- 
ally infeasible. We thus assume that only a limited number m of data sets 
xi, x 2 , x m can be obtained by step 2 of Algorithm 1 (m M). We denote 
the set of realizations simulated from the joint distribution of the parameter 
and the data by Z n ^ m = {(xi, #1), (x 2 , 6*2), (x m , 9 m )}, where each data set 
Xj of n IID observations is simulated from the model Pq v 

In AABC, we substitute the joint sampling distribution P 9 of a data set 
of size n with the joint sampling distribution Qg, from which simulating data 
sets is computationally inexpensive. In replacing Pg with Qg, we require 
that the posterior distribution 7t(0|x o , Qg) based on the likelihood implied 
by model Qg approximates the posterior distribution 7r(6>|x , Pg) based on 
the likelihood implied by model Pg. Further, we require that Qg can be used 
with a wide range of Pg, in the sense that Qg is constructed without using 
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Figure 3: Approximations and errors involved in simulation-based ABC in- 
ference methods. Likelihood functions of the full data and the summary 
statistics are denoted respectively by p(x|0) and p(s\9). Exact ABC with 
full data involves only the Monte Carlo approximation due to sampling and 
thus is equivalent to a standard rejection algorithm. Summary statistics s 
are assumed not to be sufficient so that dimension reduction from x D to s Q 
results in an approximation. 



the details of model Pq. 

3.1 Approximations on the parameter and model spaces 
due to replacing Pg with Qg 

Two approximations are involved in substituting Pq with Q e . First, Z njTn 
includes only m parameter values 9\, 9 2 , 9 m under which data sets are 
simulated from Pq. After obtaining Z ni7n , for any new parameter value 9 
from the prior distribution under which we want to simulate a new data 
set, we substitute 9 with 9 such that (x, 9) e Z n ,m- The value 9 has the 
minimum Euclidean distance to the value 9 among all parameter values in 
Z n ^ m . More precisely, 9 = arg min||0j — 9\\. In essence, this approximation is 

equivalent to replacing the sampling distribution of the data set Pe with the 
sampling distribution Pf, we call this an approximation on the parameter 
space. However, this parameter space approximation is not sufficient to sim- 
ulate data sets efficiently, since the model Pq is still limited-generative after 
this substitution. 

As a second approximation, we substitute the model Pq with the empirical 
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distribution of the data set x that has already been simulated from P§ as 
(x, 9) G Z n ,m- Here, we assume a positive probability mass only on the data 
values observed in the set x. We call this an approximation on the model 
space because the model Pg is substituted with the empirical distribution of 
a data set simulated from Pg. 

To simulate a new data set x in AABC, we utilize a vector of positive 
auxiliary parameters (f) = 02, 0n), that satisfy Y^h=i & = 1- We let 0i 
be the probability that a random data value Xj G x is equal to a given value 
Xi found in the data set x = (xi,x 2 , ...,x n ). The premise is that the sample 
x simulated under 9 provides information about the model Pg, and by an 
approximation of 9 to 9 on the parameter space, about Pg. 

If we denote the approximate sampling distribution of a data set x = 
(xi, 22, ■■■,x n ) by Qg, its joint probability mass function is 

/ ?(x|0,x)tt(0) d<f>I { g iS} , (3) 

where g(x|0,x) = ( ni n "... n J U"=i Yl7=i <fc' i= * } , and is 1 if 9 G Z n , m 
is the closest value to 9 in the Euclidean sense and is otherwise. Here, 
is the number of times x^ observed in the new sample x, k is the number of 
distinct data values observed in the data set x, and l^ x . =s ,^ is 1 if Xj = X{ 
and is otherwise. The distribution g(x|0, x) is that of an IID sample 
x = (xi, X2, x n ), where Xj is drawn from the values (xi, £2, x n ) with 
probabilities (0i, $2, 4>n)- 

The probability vector is a parameter of the model conditional on x, 
and thus, we need to posit a prior distribution on <p. As a natural prior on 
probabilities, we let the prior distribution tt(</>) on <p be the symmetric Dirich- 
let distribution on the (n — 1) -dimensional simplex $, with hyperparameters 
(1,1,..., 1) and a uniform probability density function proportional to 1. This 
choice assigns equal weight to all distributions placing positive probability 
mass on the data points G x. Further, it assigns zero posterior probabil- 
ity to data values unobserved in the sample x, thereby avoiding difficulties 



created by such values in the likelihood (Rubin 1981 Owen, 1990). 

To distinguish the parameter and data set realizations in Z n ^ m = {(x i; 6 1 ,)}™ 
from the parameter and data sets simulated using AABC, we use starred 
versions of each quantity to denote specific values simulated in AABC. For 
example, as the sampling distribution Pg. delivers a data set Xj under a given 
parameter value 9j in the ABC procedure of Algorithm 2, the sampling distri- 
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Figure 4: Notation used in the text and algorithms. 



bution Qo* delivers a data set x* under a given parameter value 8* simulated 
from its prior distribution (see Figure [1] for notation). 

The sampling distribution Qg utilizes the information available in the set 
of realizations Z n<m through the parameter (f>, since the prior distribution of 
conditions on (x, 8) G Z n/m and thus on the set Z n ^ m . In this sense, the 
available realizations Z n ^ m are used as fixed background information about 
Pg, and inferences using the substitute model Qg are conditional on the sim- 
ulated sets Z n ^ m . 

3.2 The posterior distribution of 9 sampled by AABC 

In sampling the approximate posterior distribution of 8 by AABC meth- 
ods, we use the two ABC approximations described in Section [2] First, we 
substitute each data instance x with summary statistics s. Second, we use 
an acceptance condition with tolerance e, employing the Euclidean distance 
to measure the proximity of the summary statistics calculated from the ob- 
served and simulated data, as in equation [TJ If we let 8* be a new parameter 
value simulated from its prior distribution after obtaining the set Z ntTtl , in 
AABC we accept the parameter values 8* producing summary statistics s* 
that satisfy the condition \\s* — s \\ < e as being draws from the posterior 
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distribution. This acceptance condition corresponds to a uniform kernel, 
which we use throughout this article, although like ABC, AABC can em- 
ploy other kernels to obtain smooth weighting of s* values by their distance 
from s a . Substituting P g with Q e involves replacing p(x|0) in expression [2 
with expression [3] and adjusting the normalizing constant accordingly. The 
approximate posterior distribution sampled by an AABC method is 



7r e (0|x o ,Q 6 



X 



-s ||<e} 



g(x|0,x)vr(0) dcj) I {e §} 



7i (9) dx, (4) 



where C Qg = f e J x l{\\ s - ao \\<e} 



tt(6) <ix d9 is the 



/*9(x|0,x)7r(0) dcf) l {6 
normalizing constant. 

The AABC approach is sensible in that as the limited generative model 
increasingly permits a larger number of simulated data sets, for large sample 
sizes the posterior distribution obtained by an AABC method approaches the 
same distribution as the posterior distribution obtained by an ABC method. 
We codify this claim with a theorem. 



Theorem. Let n(9) be a bounded prior on 9. Let 7i e (9\x. , Pe) and 7r e (0|x o , Qe) 
be the posterior distributions sampled by a standard ABC method and an 
AABC method, respectively. Then 

lim lim -k e {6\x. , Q e ) = lim ir e (9\x , P e ). (5) 

m— >oo n— »oo n— ¥00 

A proof of the theorem is given in Appendix 1. The convergence of the 
posterior distribution sampled by AABC is a consequence of the fact that, 
for each given value of 9, the sampling distribution q(x\(/>, x)7r(0) d<p ^ 
converges to the true sampling distribution ^(xl^) as the sample size n and 
the number of simulated samples m from Pe increase. The intuition for the 
double limit in equation [5] is as follows. The standard notion of a distibution 
converging to a point in the parameter space as the sample size n increases 
does not directly apply to the posterior distribution 7r e (0|x o , Q e ), since this 
posterior depends not only on the sample size n, but also on the number 
m of simulated data sets from Pg. Hence, for convergence of the posterior 
distribution based on the likelihood of Qe, the requirement is that both n — > 
oo and m — > oo. As n — > oo, the empirical distribution converges to Pg, 
the correct sampling distribution with the incorrect parameter value 9. As 
m — > oo, the distance between the parameter value 9 under which we want 
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to simulate a new data set and the parameter value 9 G Z n>m closest to 
9 approaches zero. Therefore, taking both limits simultaneously results in 
convergence to the correct sampling distribution Pg. 

3.3 A ABC algorithms 

The structure of AABC algorithms sampling the posterior distribution in 
expression [4] can be conveniently summarized in three parts, as shown in 
AABC by a rejection algorithm (Figure [5]). In Algorithm 2, Part I involves 
obtaining a limited number of realizations from the joint distribution of the 
parameter and the data from the limited-generative model Pg. Part I simply 
involves the application of steps 1 and 2 from Algorithm 1, but only for m 
iterations. Part II involves simulating a new parameter value 9* from its 
prior distribution (step 4) and then simulating a data set x* from the model 
Qg* (steps 5, 6, 7), conditional on Z n m obtained in Part I. Part III involves 
comparing the summary statistics s* calculated from the simulated data set 
x* with the summary statistics s Q calculated from the observed data set x a , 
to accept or reject the parameter value 9*. The calculation and comparison 
of summary statistics follows the same procedure as in steps 3 and 4 of 
Algorithm 1. Hence, Part II of AABC by rejection has the novel steps 5, 6, 
and 7, whereas Parts I and III use the machinery of ABC by rejection from 
Algorithm 1. 

We can show that Algorithm 2 samples the correct posterior distribution 
7i e (9\x ,Qe). The probability of sampling a parameter value 9 in Algorithm 
2 is proportional to 

Y Y 7l ( 9 ) I {e,e} n (^)lW^ x)I { || s _ So || <e} 

s <p 

= Y Y 7r ^' ^ > ) 1 {e,e}^( x l^' *)!{[[•-•«,[[<«} 

s <fi 

« Y Y 7r ^' W^lk-'oIKe)} 

s <fi 

oc ir e (9\x ,Q e ), 

where the third line follows from the fact that the expression on the second 
line is the product of the likelihood under the model Qg and the prior, and 
therefore it is proportional to the posterior distribution of parameters based 
on the model Qg. 
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Algorithm 2: AABC by rejection algorithm 



Parti 


II 1 ' 

9\ 62 ■ m 




1 . For i = 1, 2, .... m, simulate 0$ ~ tt(#). 

2. For i = 1, 2, m. simulate x,. ~ . 

3. Let £ nm = {(Xi,0i) (Xm^m)}- 


Xi x 2 ■ ■ ■ x m 




2 


n, m = {(x I ,^)>r=i 


Part II 


«(.$) 

I ll' 1 

#1 S 2 fl m 0*M 


J. Simulate flj~7r(0). 

5. Find x» = (xii, Xi2, Xj n ), where (x,;. 0,;) €.2n,m and 0* is the 
closest value to 6*. 

6. Simulate 4>i ?r(0)' 

7. Simulate a new data set x* = {x* 1; x*2i ...,x* n ), where .x^arc IID from 
values (%n , Xi2, -•, Xin) with the probabiiity of proportional to ^ . 


III 1 
Xi X 2 ■ ■ ■ Xm ■ ■ ■ Xy 












1 

01 


1 

02 • • ■ 


I 1 

0m 


J 




1 


4 ■■■ X 


Tl ' ' ' X 




Part III 


111 1 
s* *2 ■ • ■ Sm " " " SM 


8. Calculate the summary statistics s* from x*. 

9. If \\s — s* < e, output (9*. 


III 1 



Figure 5: The AABC algorithm by rejection sampling. One iteration of 
the algorithm is shown on the right, along with a schematic illustration of 
sampling from the posterior distribution of 9 based on M proposed parameter 
values in the rejection algorithm (left). 



4 Applications 

In this section, we investigate the inferential performance of AABC approach 
with two examples. The following simulation setup is used in both examples. 

4.1 Simulation study design 

We simulated a reference set with M = 10 5 realizations {(xi, 9\), (x 2 , 9 2 ), (x 10 5, 9 
by first generating 9, t ~ ir(9) and then simulating a data set Xj ~ P$ v We 
then sampled 1000 pairs (xj, 9i) from the reference set, uniformly at ran- 
dom without replacement. Thus, we selected 1000 "true" parameter val- 
ues 9i, along with corresponding test data sets Xj generated under each 
value 9i from the model Pg i . Further, we built the sets Z nm , with m = 
10 2 , 5 x 10 2 , 10 3 , 5 x 10 3 , 10 4 , 5 x 10 4 , 10 5 by sampling the reference set uni- 
formly at random without replacement for m < 10 5 , and taking all the real- 
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izations in the reference set for m = M = 10 5 . The sample size n of the data 
is described in each relevant example. 

On each test data set, we performed AABC by rejection sampling (Algo- 
rithm 2) using each set Z n ^ m . In example 1, where our goal is to compare the 
performance of the AABC and ABC approaches, we performed ABC anal- 
yses by rejection sampling (Algorithm 1) using the same sets Z n m . For all 
analyses, we obtained a sample from the joint posterior distribution of the 
parameter vector 9 by accepting the parameter vector values that generated 
data whose summary statistics were in the top 1 percentile with respect to 
the statistics calculated from the test data set, in the sense of equation [T] 
Compared to the approach of fixing the e cutoff, accepting parameter vec- 
tors that generate data whose summary statistics are in a top percentile has 
the advantage that a desired number of samples from the posterior is always 
obtained given a total fixed number of proposed parameter values. This ap- 
proach is often preferred by ABC practitioners and is convenient in our case 
for comparing ABC and AABC. 

We assessed the accuracy of the posterior samples for each component 
of the parameter vector 9 separately, using the root sum of squared error 
for standardized parameter values accepted in the posterior sample. For 
a generic scalar parameter a, the root sum of squared errors is given by 



rsse = (iao^e; 

=i( a o ~ aT) 2 /Var(a), where a — (a±, ac r ) are r 

accepted values in the posterior sample, a-p is the true parameter value, and 
Var(a) is the variance of the set of r values. We report the mean RSSE 
over 1000 test data sets as RMSE = (1/1000) Yn=i RSSE, (see ~ 
BaldinglpHOl )). 



Nunes and 



4.2 Example 1: The strength of balancing selection in 
a multi-locus K -allele model 

In this section, we consider inference from the stationary distribution of 
allele frequencies in the diffusion approximation to a Wright-Fisher model 



with symmetric balancing selection and mutation (Wright, 1949). If we let 



di > 0, with i = 1,2,...,K, and ^ i=1 &\ = 1, and denote the frequency 
of allelic type i in the population at a genetic locus, the joint probabil- 
ity density function of allele frequencies x = (oi, a 2 , or-) is f(x\a,p) = 
c(a, /i) -1 exp(— a ^2f =1 of) Y\f =l a!*' K ~ l . Parameters a and /i determine the 
population-scaled strength of balancing selection and the mutation rate, re- 
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spectively. A data set of observed allele frequencies is a random sample of n 
draws from the population frequencies f(x\a, //). 

ABC methods are well-suited for inference from this model for three rea- 
sons. First, the statistics Y^j=i a "j an d — Ylj=i ^°& a j are jointly sufficient for 
parameters a and /i, and no information loss occurs in dimension reduction 
to the summary statistics. Second, the parameter-dependent normalizing 
constant c(a, //) is hard to calculate, and performing likelihood-based infer- 
ence on a and // is therefore difficult. Third, a method specifically designed 
to simulate data sets from f(x\a, /i) is readily available (Joyce et al, 2012), 
and performing ABC is therefore straightforward. For simplicity, we assume 
100 loci with the same true parameter values, each with K = 4, and that the 
allele frequencies at each locus are independent of the allele frequencies at 
other loci. Thus, the joint probability density function of allele frequencies 
for 100 loci is equal to the product of probability density functions across 
loci. We choose uniform prior distributions, on (0.1, 10) for the mutation 
rate (p), and on (0,50) for the selection parameter (er). 

Results. Posterior samples model parameters (a, y) obtained by ABC 
and AABC using a typical data set are given in Figure |6j In analyses with 
m = 10 2 , 5 x 10 2 , 10 3 or 5 x 10 3 simulated data sets, few samples are accepted 
with ABC, and thus, little mass is observed in ABC histograms (black). For 
small m, ABC does not produce an adequate sample size from the posterior 
distribution of parameters. AABC, however, produces a posterior sample 
of size 10 3 for any m, because 10 5 data sets are simulated from the non- 
mechanistic model (Algorithm 2, steps 5, 6, 7) and the top 1 percentile 
are accepted as belonging to the approximate posterior distribution. The 
histograms obtained by AABC recover the true value reasonably well (Figure 
|6|. The RMSE values in AABC procedures are approximately constant with 
increasing m. For m = 10 2 , 5 x 10 2 , 10 3 , 5 x 10 3 , 10 4 , 5 x 10 4 , and 10 5 simulated 
data sets, the RMSE values for parameter \i are 5.988, 5.932, 6.012, 6.086, 
6.125, 6.078, and 6.088 respectively, close to the RMSE of 5.290 obtained 
by a standard ABC approach using M = 10 5 simulated data sets from the 
mechanistic model. The RMSE values in the last column of Figure [6] show 
that an AABC approach produces posterior samples that have on average 
greater variance than posterior samples obtained from ABC with the same 
large number of realizations. Here, greater variance in posterior samples 
obtained by AABC is a result of simulating data sets in AABC by resampling 
the observed data values that are found only in the m realizations in Z n ^ m . 
Consider two parameter values 9\ and Q\ f° r which data sets and Xg are 
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Figure 6: Inference on the strength of balancing selection. The figure shows 
the marginal posterior distributions of parameters // (A), and a (B) of exam- 
ple 1 obtained with ABC by rejection (black) and with AABC by rejection 
(blue). The number m of data sets simulated from the mechanistic model 
for each analysis performed by AABC and ABC appears at the top of each 
column. The red dot on the x-axis is the true value of the parameter, equal in 
all plots. RMSE* values in each plot are from AABC analyses, averaged over 
1000 test data sets. RMSE values in the last column are from corresponding 
ABC analyses. 



simulated in the AABC approach by steps 5, 6, 7 of Algorithm 2 such that 
the parameter value 9 G Z n ^ m closest to both 6\ and Q\ is the same value. The 
data sets and x^ can include only the data values observed in x of the pair 
(x,0) G Z n>m . On average, x^ and x^ share more observations in common 
than two data sets simulated from the respective mechanistic models Pg* 
and Pg*. Therefore, each data set simulated in the AABC approach using 
Qe is expected to be less able to distinguish between different parameter 
values than the independent data sets simulated in the ABC approach using 
Pg. This situation results in relatively flat likelihoods and hence posterior 
samples with larger variance. 
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4.3 Example 2: Admixture rates in hybrid populations 



Models in which hybrid populations are founded by, and receive genetic con- 
tributions from, multiple source populations are of interest in describing the 
demographic history of admixture. Stochastic models including admixture 
often result in likelihoods that are difficult to calculate, and statistical meth- 
ods capable of performing inference on admixture rates have received much 
attention for their implications on topics ranging from human evolution to 



conservation ecology (Falush et al, 2003 Tang et al., 2005 Buerkle and 



Lexer, 2008). Here, we consider inference on admixture rates from a mecha- 



nistic model of Verdu and Rosenberg (2011). We use reported estimates of 



individual admixture as data. 

We consider a model of admixture for a diploid hybrid population of 
constant size N, founded at some known t generations in the past with con- 
tributions from source populations A and B. We follow the distribution of 
admixture fractions of individuals in the hybrid population at a given ge- 
netic locus. Each generation, the admixture fraction for each individual 
in the hybrid population is obtained as the mean of the admixture frac- 
tions of its parents. The parents are chosen independently of each other, 
from source population A, source population B, or the hybrid population 
of the previous generation with probabilities Pa,Pb, and pn, respectively 
(Pa + Pb + Ph = 1)- In the special case of the founding generation, p H = 0, 
and we assume Pa = Pb = 0.5. Individuals from source populations A and 
B are assigned admixture fractions of 1 and respectively. For example, if 
both parents of an individual in the hybrid population of the founding gen- 
eration are from source population A, that individual has admixture fraction 
(1 + l)/2 = 1. If both parents are from population 2, the admixture fraction 
is (0 + 0)/2 = 0, and if one parent is from population 1 and the other is 
from population B, then the admixture fraction is (1 + 0)/2 = 0.5. The dis- 
tribution of the admixture fraction in the hybrid population is propagated 
in this manner for t generations until the present, in which a sample of n 
individuals is obtained from the resulting distribution (Figure [7]). Our goal is 
to estimate the admixture rates (pa,Pb,Ph), given the individual admixture 
fractions estimated from observed genetic data. 

We apply the AABC approach using individual admixture fractions from 
n = 604 individuals from Central African Pygmy populations reported by 
Verdu et al. (2009), with an assumed constant population size of N = 10 4 . 



This assumption differs slightly from the original model in |Verdu and Rosen- 
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sample of n 
individuals 



Figure 7: The admixture model of example 2. 



berg (2011) in that a finite population size is assumed, so that only 10 4 



admixture fraction values are allowed in the population at any given gen- 
eration. We assume that an admixture event with contributions from two 
ancestral source populations started at the mean estimate of t — 771 gen- 



erations ago (Verdu et al. 2009) with a generation time of 25 years, and 



that it continued until the present. Source population A refers to an an- 
cestral Pygmy population, and source population B refers to an ancestral 
non-Pygmy population. The feature of this model relevant to our method 
is the computational intractability of simulating data sets. For each set of 
parameter values (pa,Pb,Ph) simulated from the priors, the distribution of 
admixture fractions is discrete on a support of a number of admixture frac- 
tion values that doubles each generation, and this distribution evolves for 
771 generations. A random sample of admixture fraction values comparable 
to the values calculated from the observed data set is obtained from the dis- 
tribution of the present generation. Simulating a large number of data sets 
under this model with such a large number of generations is computationally 
infeasible, and standard ABC is impractical. We thus perform AABC by re- 
jection (Algorithm 2) using m = 10 4 realizations from this model. We assume 
a Dirichlet prior with hyperparameters (1, 1, 1) on parameters {pa,Pb,Ph)- 

We also assessed the contribution of the approximations on the parame- 
ter and model spaces in the AABC approach to the RMSE separately, with 
a simulation study using a small number of generations (t = 30), where 
simulating data sets from the mechanistic model is feasible. First, we per- 
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formed AABC with rejection as in Algorithm 2 with 1000 "true" data sets 
using m = 10 2 ,5 x 10 2 ,10 3 ,5 x 10 3 ,10 4 ,5 x 10 4 , and 10 5 realizations from 
the model, and we calculated the RMSE for pa,Pb, and pu over 1000 "true" 



data sets as described in Section [41] This AABC analysis includes error due 
to approximations on the parameter space and on the model space. Second, 
we performed an AABC analysis with the same set of m realizations, by in- 
cluding the error only due to the approximation on the parameter space. We 
achieved this by running Algorithm 2 up through step 5, and then simulating 
data sets from the mechanistic model by substituting steps 6 and 7 of Algo- 
rithm 2 with step 2 of Algorithm 1, the standard ABC approach by rejection. 
By this substitution, all data sets are simulated from the mechanistic model, 
but each data set is obtained using a parameter vector (Pa,Pb,Ph) found 
in step 5 of Algorithm 2. In this procedure, the error due to the approx- 
imation on the model space is eliminated, because data sets are simulated 
from the correct mechanistic model and not by resampling from the avail- 
able realizations in Z n m . However, this procedure includes error due to the 
approximation on the parameter space, because each data set is simulated 
not under the correct proposed parameter value, but under the parameter 
value (Pa,Pb,Ph), the closest value to the correct proposed value that can be 
found in Z n m . We compared the RMSE of the AABC procedure involving 
the approximation on both the parameter and model spaces and the RMSE 
of the AABC procedure involving only the approximation on the parameter 
space to the RMSE obtained from a standard ABC approach. For these two 
AABC procedures, we also compared the percent excess in RMSE, defined 
as the ratio of the absolute difference in RMSE of the AABC and standard 
ABC approaches to the RMSE of the standard ABC approach, expressed as 
a percent. 

Results. The individual admixture fractions calculated from the Pygmy 
data carry substantial information about the admixture parameters Pa,Pb, 
and ph, since the joint posterior distribution is concentrated in a relatively 
small region of the 3-dimensional unit simplex on which (pa,Pb,Ph) sits 
(Figure [8|\). The marginal posterior distributions (Figure [8^3, |8p, and[8£)) 
have means pa = 0.151, ps = 0.132, and pu = 0.717. These values are inter- 
preted as contribution of genetic material of 15.1% from the ancestral Pygmy 
population (source population A), 13.2% from the ancestral Non-Pygmy pop- 
ulation (source population B), and 71.7% from the hybrid population to itself 
at each generation, over 771 generations of constant admixture. 

For the simulation study with t = 30 generations and 1000 "true data" 
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Figure 8: AABC analysis on the Pygmy data of example 2 with m = 10 4 
realizations under the mechanistic model. (A) The joint distribution on 
the unit simplex, with probability mass increasing from white to dark red. 
(B,C,D) Marginal distributions of pa,Pb, and pu- 
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sets, the RMSE values from AABC analyses decrease with increasing m {Fig- 
ure|9]A., [9^3, |9p). Further, as m increases, the error due to the approximation 
on the parameter space decreases (Figure [9jZ> last column), due to the fact 
that for large m, the difference decreases between the closest parameter value 
chosen at step 5 of Algorithm 2 and the correct parameter value under which 
we want to simulate a data set. In fact, the RMSE from the AABC analysis 
with m = 10 5 realizations and approximation only on the parameter space 
and the RMSE from the standard ABC approach are virtually indistinguish- 
able (Figure |9j\, [9^, |9p, red star). For m = 10 3 , the AABC analysis with 
approximations on the parameter and model spaces has a percent excess 
RMSE of 13.81%, whereas AABC analysis including only the approximation 
on the parameter space has excess RMSE of 6.61%. That is, at m = 10 3 , 
approximately half of the excess RMSE in the AABC approach with respect 
to the standard ABC analysis comes from the error due to the approxima- 
tion on the parameter space and half arises due to the approximation on the 
model space. 



5 Discussion 

Performing likelihood-based inference from statistical models incorporating a 
multitude of stochastic processes is often challenging due to computationally 
intractable likelihoods. In principle, when stochastic processes are complex 
but a family of parametric statistical models is well-defined, data can be sim- 
ulated from the model to assess the parameter likelihoods. In the last decade, 
ABC methods have become a standard tool to perform approximate Bayesian 
inference in subject areas such as ecology and evolution, by exploiting the 
idea of simulating many data sets from a model, when such simulations are 
computationally feasible. To deliver an adequate sample from the posterior 
distribution of the parameters, however, ABC requires a large number of sim- 
ulated data sets, and it might not perform well when only a limited number 
of data sets can be simulated. 

In this article, we introduced an approach that extends simulation-based 
Bayesian inference methods to model spaces in which only a limited number 
of data sets can be simulated from the model, at the expense of requiring 
approximations on the parameter and the model spaces. Our AABC ap- 
proaches rely on two statistical approximations. In our approximation on 
the parameter space, for each parameter simulated from the prior distribu- 
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Figure 9: RMSE in the admixture model. The decrease in RMSE is shown 
for parameters pa (A), p B (B), and pu (C) with increasing m, the num- 
ber of simulated samples from the mechanistic model, for AABC analysis 
performed with an approximation only on the parameter space (green), and 
with an approximation on both the parameter space and the model space 
(blue). The red star in each plot is the RMSE obtained by a standard ABC 
analysis performed with M = 10 5 simulated values. (D) The percent excess 
in RMSE of the two AABC approaches relative to a standard ABC approach 
for parameters pa_,Pb, and pn- 
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tion, we take the closest parameter value available in the set of realizations 
Z>n,m obtained from the mechanistic model. This approach has a uniform ker- 
nel smoothing interpretation in the sense that each parameter value in the 
set Z nt m dissects the support of the prior distribution into non-overlapping 
components such that each interval is mapped to the same parameter value 
in Z njjn . Each component then represents the support of a uniform kernel. 
Kernel approximations have an operational role in implementing ABC meth- 
ods, and a natural future direction for AABC is to improve the accuracy of 
posterior samples using smooth weighting kernels for the approximation on 
the parameter space. 

The approximation on the model space is achieved by assigning Dirich- 
let probabilities to data points of realizations obtained from the mechanistic 
model. This is a variation on the resampling method originally introduced in 
Rubin's Bayesian bootstrap (Rubin, 1981), and therefore, it is an application 
of Bayesian nonparametric methods. From this perspective, AABC methods 
connect standard model-based Bayesian inference on model-specific parame- 
ters and Bayesian nonparametric methods within the ABC framework. 

Our approach of using a non-mechanistic model and Bayesian resam- 
pling methods to help perform inference on model-specific parameters of a 
mechanistic model is a fundamental difference between AABC and existing 
ABC methods. ABC performs inference on model-specific parameters of a 
mechanistic model using a likelihood based purely on the mechanistic model. 
AABC instead performs inference on the same model-specific parameters of 
the mechanistic model as ABC, using a likelihood based on a non-mechanistic 
model that incorporates a limited number of data sets simulated from the 
mechanistic model. Consequently, the model likelihoods used in ABC and 
AABC are not exactly the same, and the posterior distributions targeted by 
the two classes of methods are not exaxctly equivalent for finite sample sizes. 
The advantage of AABC methods in contrast to pure non-mechanistic mod- 
eling approaches (e.g., nonparametric methods) is that AABC can perform 
inference on the quantities of interest — the model-specific parameters of the 
mechanistic model. 

Unlike other ABC methods, the AABC approach delivers a posterior sam- 
ple of desired size from the joint distribution of parameters for any m > 1. 
This is both a strength and a limitation of AABC. The strength is that in 
practice, a researcher can fix m and thus the computation time a priori, to 
simulate data from the mechanistic model to obtain a reasonable inference 
by AABC; other ABC methods may fail to produce an adequate posterior 
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sample in equivalent computation time. In our example, for moderate values 
of m (e.g., 10 3 to 10 4 ) for which standard ABC approaches were unsatis- 
factory, AABC adequately sampled an approximate posterior distribution. 
The limitation is that when m is too small, the posterior sample obtained by 
AABC can be a distorted representation of the true posterior distribution. 
Although in the limit, AABC and ABC are expected to produce similar re- 
sults, the posterior distribution sampled by an AABC approach is not the 
correct posterior distribution, because many parameter values simulated from 
the prior are tested for acceptance based on repeated use of the data values 
in m realizations, instead of based on data sets simulated independently of 
each other. A future direction is to investigate the relationship between m 
and the dimensionality of the parameter space to optimize m in producing a 
given level of accuracy for approximating the true posterior distributions. 
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Appendix 1 

We let k < n be the number of distinct values Xi,x 2 , ...,Xk in the data set 
x, and denote the number of observed Xi by hi, where n = Yli=i ™i- Then 
the prior distribution for the probabilities of an AABC replicate data set 
based on the ABC simulated data set x is the Dirichlet distribution ir(<f>) = 

[ r (Z)i=i^)/nLi r (^)] Yli=i < P ni ~ 1 witn parameters Hi, n 2 , n k . The spe- 
cial case of the prior proportional to 1 described in the text is obtained with 
k = n, when all observations in x are distinct (hi, = h 2 = ■ ■ ■ — h n — 1). 
Our goal is to show that lim^oo lim^oo 7r e (0|x o , Q e ) = lim^oo 7r e (^|x , P e ). 
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Recalling equation |4| 
lim lim 7r e (#|x , Q e ) = lim lim — - / 

m— >oo n— >oo m-4oon->ooOg ( J ^ 



|s- So ||<e} 



g(x|0,x)7r(0) d4> \ {e - g} 
(6) 



The integral in the brackets is the expectation of g(x|0, x), with respect to 
the prior 7r(<p). We let C = ( nj n ™ ) , and using the definition of g(x|0, x) 



C rii=i EHLi ^ * <} insertion 
we get 



3.1 



and vr(0) = [r(£* = i *)/ UU r(n,)] EEta ^ 



n- =1 ri 



7? ; 



3=1 



n 

vi=l 



1 



d(j>. 



Here, we have exchanged the order of the product over j with the integral 
since the expectation of the product of n IID observations in sample x is 
equal to the the product of the expectations of observations Xj. We label the 

realized value of the jth data point Xj by (j) such that Yl7=i 
and write 



,/(xi<m),t«/» ( /</> =g ^fej n 



n-=i ri 
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dd>. (7) 
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Using j $ [nti .^ ( . )r(fii)]r(fi( . )+1) Ul i= wO) ' ' 



» ^« d(j> = 1 (p. 487, 



Kotz 



et al. (2000)), we substitute the integral in equation ^ with the ratio of the 



gamma functions to get 



g(x|0,x)7r(0) dcf> = C Jtk^-\ 11 w 



^ |n-=i^ (i )r(ni) r(n w + i) 



n<=i r (^) j=i r [(E i= i,^(i) n*) + "(j) + 1] 



3=1 



T(n) r(n 0) + i 
r(n y) ) r(n + i) 



3=1 



n 



Substituting C n?=i (~7r) f° r the integral in brackets in equation (6), we 
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have 



lim lim n 6 (9\x ,Q e ) = lim lim — — / I { || s _ So || <e} C TT ( — J h e §xir(9) rfx 

m— >oo n— too m— too n— too LyQ g J% y Tl J ' 

lim lim f I { || s _ So ||< £} G f[ f — ) rfx 

m— too n— too J% \ Tl J 



lim lim Cq b 

m—too n—too 

(8) 

We apply the dominated convergence theorem to exchange the limits in 
n and the integrals in the numerator and denominator of equation (J8]). The 
assumptions of the theorem are satisfied as follows: 1) The integrand in equa- 
tion (|sj) is bounded: The indicator functions are bounded by 1, the ratios 
(ny)/n), where < n are bounded by 1, and the prior tt(9) is bounded by 
assumption. 2) lim n _ ) .oo(^(j)/ ri ) converges pointwise to the probability of X(j) 
under 9 and the model P§, given by p(x(j)\9), by the frequency interpreta- 
tion of probability. Exchanging the limits in n and the integrals, and using 
limr^+oo (ri(j)/n) = p(x {j) \9), 

lim / I { || s _ So || <e} TT [pO (j )|#)l 0> I {oS} n(9)dx 



lim lim 7r e (9\x ,Q e ) 

m-4oo n^cxi llffl Op. 

1 a 



lim / I { || s _ So || <e} p(x|^) I {9 M7i(9) dx 



lim Cp- 



(9) 



where (9 ) follows by the definition of the joint distribution p(x|0) = IIj=i P( x (j)W) 

We now apply the dominated convergence theorem a second time to ex- 
change the limits in m and the integrals on X. Again, the assumptions 
of the dominated convergence theorem are satisfied since the integrand in 
([9]) is a sequence in m of bounded functions, and as m — > 00, 9 — > 9, and 
p(x|0) -)■ p(x|0). We get 

lim lim 7r e (0|x o , Qg) = / I{|| s _ So ||< e }p(x|6»)7r(6») dx = lim 7T e (0|x o , P ) 

m— »oo n— >oo (-'Pg J A 1 n— >oo 

which shows that AABC posterior converges to the ABC posterior as the 
sample size n and the simulated number of data sets m increase. 
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